Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
نویسندگان
چکیده
This paper presents specifications and requirements for creation and validation o f large lexica that are needed in automatic Speech Recognition (ASR), Text-to-Speech (TTS) and statistical Speech-to-Speech Translation (SST) systems . The prepared language resources are created and validated within the scope o f the EU-project LC-STAR (Lexica and Corpora for Speech-toSpeech Translation Components) during years 2002-2005 . Large lexica consisting o f phonetic, suprasegmental and morphosyntactic content will be provided with well-documented specifications for 13 languages . A short summary o f the LC-STAR project itself is presented . Overview about the specification for the corpora collection and word extraction as well as the specification and format o f the lexica are presented . Particular attention is paid to the validation o f the produced lexica and the lessons learnt during pre-validation . The created and validated language resources will be available via ELRA/ELDA .
منابع مشابه
Lexicon and Corpora for Speech to Speech Translation (LC-STAR)
The objective of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) is corpora collection and lexica creation for the purposes of Automatic Speech Recognition (ASR) and Text-to-speech (TTS) that are needed in speech-to-speech translation (SST). During the lifetime of the project (2002-2005) these lexica will be specified, built and validated. Large lexica co...
متن کاملLarge lexica for speech-to-speech translation: from specification to creation
This paper presents the corpora collection and lexica creation for the purposes of Automatic Speech Recognition (ASR) and Text-to-speech (TTS) that are needed in speech-to-speech translation (SST). These lexica will be specified, built and validated within the scope of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) during the years 2002-2005. Large lexic...
متن کاملLC-STAR II: Starring more Lexica
LC-STAR II is a follow-up project of the EU funded project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Compo nents, IST-2001-32216). LC-STAR II develops large lexica containing information for speech processing in ten languages targeting especially automatic speech recognition and text to speech synthesis but also other applications like speech-to-speech translation and taggin...
متن کاملLexica and corpora for speech-to-speech translation: a trilingual approach
Creation of lexica and corpora for Catalan, Spanish and US-English is described. A lexicon is being created for speech recognition and synthesis including relevant information. The lexicon contains 50K common words selected to achieve a wide coverage on the chosen domains, and 50K additional entries including special application words, and proper nouns. Furthermore, a large trilingual spontaneo...
متن کاملCreating Slovenian Language Resources for Development of Speech-to-speech Translation Components
Article brings detailed information about procedures of building Slovenian lexica within the LC-STAR project, and also detailed information about the size of that lexica. University of Maribor joined the LC-STAR project in order to provide appropriate language resources for developing speech-to-speech translation technology for Slovenian language. Lexica exists from three parts: 65.000 common w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004